| Table | Field | Type | Required | Description |
|---|---|---|---|---|
| Core | exp_id | STRING | TRUE | Unique expedition identifier in the format `ISO3_YYYY` (e.g., `PNG_2024`). |
| leg | STRING | TRUE | Cruise leg or operational phase (e.g., Leg 1, Caribbean vs. Pacific) | |
| survey_type | STRING | TRUE | Type of survey conducted. Allowed values: `uvs`, `sbruvs`, `pbruvs`, `sub`, `rov`, `dscm`, `bird`, `ysi`, 'edna' | |
| ps_site_id | STRING | TRUE | Unique Pristine Seas site ID in the format `ISO3_YYYY_survey_###` (e.g., `PNG_2024_uvs_001`). | |
| location | STRING | TRUE | General area of the site (e.g., Gulf of Tribugá, Three Sister, Duff Islands) | |
| sublocation | STRING | TRUE | Finer-scale geographic area within the location, such as an island, atoll, bay (e.g., Ensendada de Utría, Bajo Nuevo) | |
| date | DATE | TRUE | Sampling date in `YYYY-MM-DD` format. | |
| time | TIME | TRUE | Local time of sampling (e.g., `14:30`). Format: 24-hour `HH:MM` | |
| lat | FLOAT | TRUE | Latitude in decimal degrees (e.g., `-0.7512`). Negative = south (WGS84) | |
| lon | FLOAT | TRUE | Longitude in decimal degrees (e.g., `-91.0812`). Negative = west (WGS84) | |
| team_lead | STRING | TRUE | Name of team lead or responsible field scientist | |
| notes | STRING | FALSE | Free-text notes describing the site |
3 Sites
Overview
The sites dataset is a core component of the Pristine Seas BigQuery database. It contains a collection of method-specific site tables that document the what, where, when, and who of fieldwork conducted during each scientific expedition. These tables provide a high-level summary of survey locations and sampling activity, capturing essential spatial, temporal, and logistical metadata.
Site tables are not designed to store method-specific sampling events such as transects, deployments, or replicates — those are handled separately in the corresponding method.stations tables within each method’s dataset.
A site represents a unique point in space and time where one or more scientific survey methods were conducted. Each site is uniquely identified by a standardized ps_site_id and serves as the fundamental spatial-temporal unit across the Pristine Seas database.
A site may contain one or more stations, each representing a specific sampling event. Stations may differ by: - Method (e.g., fish BLT vs. benthic LPI conducted at the same UVS site) - Depth stratum (e.g., submersible transects at different depths) - Replicate (e.g., multiple pelagic BRUVS rigs deployed at a single site)
This hierarchical structure allows for rich, scalable representation of spatially and methodologically diverse sampling events.
Core Site Schema
All site tables in the database share a core schema that defines the essential spatial and temporal metadata for each sampling location (Table 3.1). These fields represent the what, where, when, and who of data collection and are required across all site tables, regardless of method.
This standardized structure enables consistent quality control, supports spatial and temporal analysis, and facilitates integration of data across methods, expeditions, and years.
Method-Specific Tables
While all site tables share a standardized core schema, each sampling method introduces additional fields that capture metadata unique to that method. These method-specific fields provide essential contextual detail such as depth, platform type, habitat classification, or deployment parameters.
The following method-specific site tables are currently included in the sites dataset.
Underwater Visual Surveys
Underwater Visual Survey (UVS) sites represent the core spatial unit for SCUBA-based survey methods conducted during Pristine Seas expeditions. These methods include fish belt transects (BLT), benthic line point intercept (LPI), invertebrate counts, coral recruit surveys, and others.
In addition to the core site fields, the uvs_sites table includes two key controlled fields used to provide ecological and environmental interpretation of each site. These are:
habitat:- fore reef: Outer slope of a reef, typically high-energy and wave-exposed.
- back reef: Protected area behind the reef crest, often calmer and more sheltered.
- fringing reef: Reef structure that grows directly from the shoreline.
- patch reef: Isolated, often small reef outcrops within a lagoon or sandy area.
- reef flat: Shallow, flat section of a reef, often exposed at low tide.
- channel: Natural passage between reef structures or through atolls.
- seagrass: Shallow marine habitat dominated by seagrass beds.
- rocky reef: Hard-bottom habitat composed primarily of rock.
- other: Habitat that does not fit predefined categories.
exposure:- windward: Side of the island or reef facing prevailing winds and wave energy. Typically higher energy environments with more exposure to ocean swell.
- leeward: Sheltered side, facing away from prevailing winds. Typically calmer, with reduced wave action.
- lagoon: Located within a lagoon system, protected from direct oceanic exposure. Often shallow and calm, with restricted circulation.
- other: Exposure type does not fit standard categories (e.g., enclosed bays).
Additional fields include a site_name (often used for repeat surveys), the name of the local community (where relevant), protection status, and flags indicating which UVS sub-methods were conducted at each site (Table 3.2).
uvs_sites table
| Table | Field | Type | Required | Description |
|---|---|---|---|---|
| uvs | site_name | STRING | FALSE | Site name used in prior surveys or local knowledge (e.g., TNC_2000_001, Punta Esperanza) |
| habitat | STRING | TRUE | Dominant habitat type. Allowed: *fore reef*, *back reef*, *fringing reef*, *patch reef*, *reef flat*, *lagoon patch reef*, *channel*, *seagrass*, *rocky reef*, *other* | |
| exposure | STRING | TRUE | Wind and wave exposure at the site. Allowed: *windward*, *leeward*, *lagoon*, *other* | |
| community | STRING | FALSE | Nearest local community or population center to the site | |
| protected | BOOLEAN | FALSE | Whether the site is within a marine protected area (MPA) or Tambu | |
| blt | BOOLEAN | FALSE | Whether fish belt transects were done at this site | |
| lpi | BOOLEAN | FALSE | Whether benthic point intercept transects were done at this site | |
| ysi | BOOLEAN | FALSE | Whether YSI environmental profile was done at this site | |
| inverts | BOOLEAN | FALSE | Whether invertebrate surveys were done at this site | |
| recruits | BOOLEAN | FALSE | Whether coral recruit surveys were done at this site | |
| e_dna | BOOLEAN | FALSE | Whether eDNA samples were collected at this site | |
| photomosaic | BOOLEAN | FALSE | Whether Photomosaic imagery was collected at this site |
eDNA
The edna_sites table contains one row per environmental DNA (eDNA) sampling site. Each site represents a distinct point in space and time and serves as the primary spatial unit for eDNA fieldwork. Within a site, multiple water samples (replicates) may be collected across different depth strata, recorded in the corresponding edna.stations table.
In addition to the core site fields, the edna_sites table includes method-specific metadata (Table 3.3), such as:
exposure– Same controlled vocabulary as inuvs_sites
habitat– Same asuvs_sites, with the following additional categories:- open water – Offshore or pelagic environments
- bay – Semi-enclosed coastal embayments
- estuary – Transitional area between river and marine systems
- mangrove – Shallow, intertidal forested coastal habitat
edna_sites table
| Table | Field | Type | Required | Description |
|---|---|---|---|---|
| edna | habitat | STRING | TRUE | Dominant habitat type. Allowed values: *fore reef*, *back reef*, *fringing reef*, *patch reef*, *reef flat*, *channel*, *seagrass*, *rocky reef*, *open water*, *bay*, *estuary*, *mangrove*, *other*. |
| exposure | STRING | TRUE | Wind and wave exposure at the site. Allowed values: *windward*, *leeward*, *lagoon*, *other*. | |
| paired_ps_site_id | STRING | FALSE | `ps_site_id` of a paired site (e.g., a `uvs` or `pbruvs` site), if applicable | |
| n_stations | INTEGER | TRUE | Number of unique stations (i.e., depth strata) sampled at the site | |
| n_samples | INTEGER | TRUE | Total number of water samples (replicates) collected at the site | |
| site_photos | STRING | FALSE | path to associated site photos, if available (e.g., eDNA/site_photos/COL-2022-edna-001) |
Seabed BRUVS
The sbruvs_sites table contains one row per seabed Baited Remote Underwater Video (sBRUV) deployment site. These sites represent individual stationary stereo-video deployments, typically conducted at depths from 10 to 70 meters.
Each site corresponds to a single BRUV deployment, meaning that site and station are effectively one-to-one for this method.
In addition to the core site schema, the sbruvs_sites table includes method-specific descriptors (Table 3.4):
habitat– Same controlled vocabulary asuvs_sites, with the following additional values: bay, estuary, mangrove, sand flatexposure– Same vocabulary asuvs_sites.
Deployment-specific details such as depth, rig ID, and camera identifiers are stored in the associated sbruvs.stations table.
sbruvs_sites table
| Table | Field | Type | Required | Description |
|---|---|---|---|---|
| sbruvs | habitat | STRING | TRUE | Simplified habitat classification at the site. Allowed values: *fore reef*, *back reef*, *fringing reef*, *patch reef*, *reef flat*, *channel*, *seagrass*, *rocky reef*, *bay*, *estuary*, *mangrove*, *sand flat*, *other*. |
| exposure | STRING | TRUE | Wind and wave exposure at the site. Allowed values: *windward*, *leeward*, *lagoon*, *other*. |
Pelagic BRUVS
Pelagic Baited Remote Underwater Video (pBRUV) sites represent open-water deployments of stereo-video systems used to survey pelagic fish communities. Each site corresponds to a single 5-rig deployment set, with each rig treated as a separate station. As such, the pbruvs_sites table contains one row per deployment set, while rig-specific data are recorded in the corresponding pbruvs.stations table.
In addition to the core site schema, the pbruvs_sites table summarizes deployment metadata across all five rigs in a standardized way (Table 3.5):
n_rigs– Number of rigs deployed (typically 5)drift_m– Mean drift distance (meters) across rigsdrift_hrs– Mean soak time (hours)uwa_string_id– String (site) identifier used by the University of Western Australia
Latitude and longitude represent the mean start position across all rigs, and time fields reflect the start time of the first rig. These values provide a spatial-temporal summary of the full deployment set.
pbruvs_sites table
| Table | Field | Type | Required | Description |
|---|---|---|---|---|
| pbruvs | n_rigs | INTEGER | TRUE | Number of rigs deployed at the site (typically 5) |
| drift_m | FLOAT | TRUE | Mean drift distance across all rigs, in meters (m). | |
| drift_hrs | FLOAT | TRUE | Mean deployment duration across all rigs, in hours (h). | |
| uwa_string_id | STRING | TRUE | String (site) identifier used by the University of Western Australia |
Birds
The bird_sites table contains one row per seabird survey transect. Each site represents the starting location and time of a vessel- or land-based transect during which seabird observations were recorded. Each site corresponds to a single station, representing the full transect.
Although transects are mobile, the ps_site_id is anchored to the start point of the transect to provide consistent spatial referencing across the dataset.
In addition to the core site schema, the bird_sites table includes a site-level descriptor for habitat, using a custom controlled vocabulary tailored to these surveys:
- open ocean – Offshore transects over deep water, far from land or coastal influence
- coastal – Nearshore waters along mainland or island coastlines
- inshore – Sheltered bays, estuaries, or nearshore zones with limited wave exposure
- island – Terrestrial habitats on offshore islands, often with seabird nesting colonies
- inland – Land-based habitats far from marine influence (e.g., wetlands, forest, grassland)
- other – Rare or unique environments not captured by the categories above
Transect-specific metadata — including platform type, duration, distance traveled, and species observations — are stored in the corresponding birds.stations and birds.observations tables.
birds_sites table
| Table | Field | Type | Required | Description |
|---|---|---|---|---|
| birds | habitat | STRING | TRUE | Broad classification of the survey environment. Allowed values: *open ocean*, *coastal*, *inshore*, *island*, *inland*, *other*. |
ROV
Each ROV (Remotely Operated Vehicle) deployment is represented by a single site with one or more associated stations. The site corresponds to the full ROV dive (deployment), while each station represents a horizontal transect or observational segment within the dive. This structure follows the standard Pristine Seas convention: sites capture high-level spatial and temporal metadata, while stations contain transect-specific sampling and observation data.
The rov_sites table records the core spatial and temporal metadata for each ROV deployment. Deployment start time (time_deploy) and coordinates (lat_deploy, lon_deploy) are used to populate the standardized core fields time, lat, and lon, ensuring consistency across methods.
Method-specific metadata—such as recovery time and coordinates, dive_type, max_depth_m, duration, and highlights—are retained within the rov_sites table (Table 3.7).
Transect-specific information, including start/end depth, time, coordinates, and observation notes, is stored in the corresponding rov.stations table.
rov_sites table
| Table | Field | Type | Required | Description |
|---|---|---|---|---|
| rov | dive_type | STRING | FALSE | Purpose of the dive (e.g., transect, exploration, sample collection) |
| time_deploy | TIME | TRUE | Time ROV left the surface | |
| lat_deploy | FLOAT | TRUE | Latitude at ROV deployment | |
| lon_deploy | FLOAT | TRUE | Longitude at ROV deployment | |
| time_recovery | TIME | FALSE | Time ROV returned to the surface | |
| lat_recovery | FLOAT | FALSE | Latitude at ROV recovery | |
| lon_recovery | FLOAT | FALSE | Longitude at ROV recovery | |
| max_depth_m | FLOAT | TRUE | Maximum depth reached during the dive | |
| duration | TIME | FALSE | Total duration of the dive (hh:mm:ss) | |
| highlights | STRING | FALSE | Narrative summary or scientific highlights of the dive |
Submersible
Each submersible dive is represented by a single site with one or more associated stations. The site corresponds to the entire submersible deployment (dive), while each station represents a horizontal transect or visual survey segment conducted during that dive.
The sub_sites table captures the spatial, temporal, and operational context of each dive. In addition to the standardized core fields shared across all site tables, it includes method-specific metadata relevant to submersible operations—such as the submersible name, dive_type (e.g., science, media, policy), max_depth_m, observers, pilot, and precise timestamps for key waypoints (e.g., time on bottom, surface recovery).
To maintain alignment with the shared site schema:
- The start of descent provides the
time,lat, andlonused in the core fields. - The primary scientific observer (
observer_1) is mapped toteam_lead.
Transect-specific information, such as start/end depth, time, habitat descriptions, and notes, is stored in the corresponding sub.stations table.
sub_sites table
| Table | Field | Type | Required | Description |
|---|---|---|---|---|
| Submersible | sub_name | STRING | TRUE | Name of submersible used (e.g., Argonauta or DeepSee) |
| dive_number | STRING | FALSE | Running sub dive number | |
| depth_max_m | FLOAT | TRUE | Maximum depth reached (m) | |
| duration | TIME | TRUE | Total dive duration (hh:mm:ss) | |
| temp_max_depth_c | FLOAT | FALSE | Temperature at maximum depth (°C) | |
| observer_1 | STRING | FALSE | Primary scientific observer | |
| observer2 | STRING | FALSE | Secondary observer (if any) | |
| pilot | STRING | FALSE | Submersible pilot | |
| dive_type | STRING | FALSE | Type of dive. Allowed values: science, media, policy, training | |
| collection | BOOLEAN | FALSE | Whether any biological collection occurred | |
| transect | BOOLEAN | FALSE | Whether transects were conducted | |
| edna | BOOLEAN | FALSE | Whether eDNA samples were collected | |
| time_descent | TIME | TRUE | Time when sub began descent | |
| lat_descent | FLOAT | TRUE | Latitude at start of descent | |
| lon_descent | FLOAT | TRUE | Longitude at start of descent | |
| time_on_bottom | TIME | FALSE | Time of first bottom contact | |
| lat_on_bottom | FLOAT | FALSE | Latitude at bottom contact | |
| lon_on_bottom | FLOAT | FALSE | Longitude at bottom contact | |
| time_off_bottom | TIME | FALSE | Time when sub left the bottom | |
| lat_off_bottom | FLOAT | FALSE | Latitude at lift-off | |
| lon_off_bottom | FLOAT | FALSE | Longitude at lift-off | |
| time_surface | TIME | FALSE | Time when sub surfaced | |
| lat_surface | FLOAT | FALSE | Latitude at surface recovery | |
| lon_surface | FLOAT | FALSE | Longitude at surface recovery |
Deep-Sea Cameras
Each deep-sea camera deployment is represented by a single site–station pair. In line with the Pristine Seas schema, the site captures the spatial and contextual metadata of the deployment, while the station represents the full observational unit — including technical specifications, environmental conditions, and recording parameters.
The dscm_sites table records the core spatial and temporal metadata for each deployment. Deployment time (time_deploy) and coordinates (lat_deploy, lon_deploy) populate the standard core fields time, lat, and lon, following conventions used across all methods.
Deployment-specific details — such as max_depth, bottom temperature, ambient water temperature, recovery time and position, and recording duration — are stored in the corresponding dscm.stations table.